Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TableWriterRepalyer #11100

Closed
wants to merge 1 commit into from

Conversation

duanmeng
Copy link
Collaborator

@duanmeng duanmeng commented Sep 26, 2024

Adds TableWriterReplayer to facilitate the replaying of TableWriter operator.
Uses the given plan node ID to find the traced TableWriteNode from the traced plan.
It helps create a new TableWriterNode and rebuild a query plan with a QueryTraceScanNode,
then apply the traced configurations, and rerun.

QueryTraceScanNode holds the traced data type and dir for a given plan node ID.
These information can be utilized to build the QueryTraceScan operator. It creates
a QueryDataReader using the traced data type and input data file. To find the right
input data file for replaying, we need to use both the pipeline ID and driver ID, which
are only known during operator creation, so we need to figure out the input traced
data file and the output type dynamically.

Part of #9668

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 26, 2024
@duanmeng duanmeng marked this pull request as draft September 26, 2024 03:16
Copy link

netlify bot commented Sep 26, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 9ab50f3
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6700117c4de77700080886ae

@duanmeng duanmeng force-pushed the replay_writer branch 2 times, most recently from 02877f4 to 63e9547 Compare September 26, 2024 07:29
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng overall looks good. Can you add e2e test for this? Thanks!

velox/exec/tests/utils/PlanBuilder.h Outdated Show resolved Hide resolved
velox/exec/trace/QueryTraceUtil.h Outdated Show resolved Hide resolved
velox/exec/trace/QueryTraceUtil.h Outdated Show resolved Hide resolved
velox/exec/trace/QueryTraceUtil.h Outdated Show resolved Hide resolved
velox/exec/trace/QueryTraceUtil.h Show resolved Hide resolved
velox/exec/trace/QueryReplayScan.h Outdated Show resolved Hide resolved
@duanmeng duanmeng force-pushed the replay_writer branch 15 times, most recently from a5f4023 to 42091fd Compare October 1, 2024 15:03
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng looks good overall. Some minors

velox/exec/tests/utils/PlanBuilder.h Outdated Show resolved Hide resolved
velox/exec/trace/QueryDataReader.h Outdated Show resolved Hide resolved
velox/exec/trace/QueryDataReader.cpp Outdated Show resolved Hide resolved
velox/exec/trace/QueryTraceUtil.h Outdated Show resolved Hide resolved
velox/exec/trace/QueryTraceUtil.h Outdated Show resolved Hide resolved
velox/tool/trace/TableWriterReplayer.h Show resolved Hide resolved
core::PlanNodePtr createPlan() const override;

private:
const std::string targetDir_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we name traceDir_ to be consistent with naming in other places?

Copy link
Collaborator Author

@duanmeng duanmeng Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

targetDir_ is the output directory when we replay the TableWriter. Rename it to replayOutputDir_?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it to replayOutputDir_

const auto traceRoot = fmt::format("{}/{}", rootDir_, taskId_);
return PlanBuilder()
.traceScan(
fmt::format("{}/{}", traceRoot, nodeId_),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traceNodeDir as name for traceScan input? thanks!

velox/tool/trace/TableWriterReplayer.cpp Show resolved Hide resolved
const std::vector<std::string>& partitionKeys,
const RowTypePtr& rowType) {
ASSERT_EQ(actualDirs.size(), expectedDirs.size());
auto iterActual = actualDirs.begin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/iterActual/actualDirIt/
s/iterExpected/expectedDirIt/

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@duanmeng duanmeng marked this pull request as ready for review October 3, 2024 10:13
@duanmeng duanmeng force-pushed the replay_writer branch 2 times, most recently from 0d15d96 to ae33331 Compare October 3, 2024 11:48
@duanmeng duanmeng changed the title Add replay for TableWriter Add TableWriterRepalyer Oct 3, 2024
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duanmeng thanks for the update. LGTM!

velox/exec/trace/QueryTraceUtil.h Outdated Show resolved Hide resolved
@@ -31,17 +33,46 @@ DEFINE_string(
task_id,
"",
"Specify the target task id, if empty, show the summary of all the traced query task.");
DEFINE_string(node_id, "", "Specify the target node id.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need these flags?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there may be multiple traced node dir in $rootDir/$taskId.

velox/tool/trace/OperatorReplayerBase.cpp Outdated Show resolved Hide resolved
velox/tool/trace/QueryReplayer.cpp Show resolved Hide resolved
velox/tool/trace/QueryReplayer.cpp Outdated Show resolved Hide resolved
velox/tool/trace/QueryReplayer.cpp Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in a933331.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants